Readme (04/22/2025)

For this study, all the data processing is done using Python, while data analysis and visualization are done using R.

1. R codes for data analysis: 
(a) "Data analysis and visualization.R": using only the datasets in section 2 as inputs to this R file is enough to reproduce all the analytical results and graphs in the paper.
(b) "Parallel processing for calculating edge connections in 3 GPT words.R": Finding all the edges among the 23 sub-emotions for the ChatGPT-assisted approach, outputting 2(f).
(c) "Parallel processing for calculating edge connections in lyric split.R": Finding all the edges among the 23 sub-emotions for the lexicon-only approach, outputting 2(g). 

2. Datasets for data analysis (inputs to "Data analysis and visualization.R"):
(a) "df7_3EmotionsCH_region_singerType_Year2.csv": yearly average sentiment scores for the ChatGPT-assisted approach.
(b) "df7_Ch_Split_region_singerType_Year2_per_100Chars.csv": yearly average sentiment scores for the lexicon-only approach.
(c) "df7_3EmotionsCH_split_synonym_expanded_emo scores_final.csv": sentiment/emotion data of every lyric for the ChatGPT-assisted approach.
(d) "df7_ch_split_emo scores_final_per_100Chars.csv": sentiment/emotion data of every lyric for the lexicon-only approach.
(e) "edges.csv": vertices and edges of the 8 primary emotions and 23 sub-emotions for creating graphs of edge bundling.
(f) "subEmo_connections_gpt.csv": All edges between the 23 sub-emotions for the ChatGPT-assisted approach.
(g) "subEmo_connections_lyric_split.csv": All edges between the 23 sub-emotions for the lexicon-based approach.
(h) "manual check.xlsx": Manually check the predictive accuracy of the ChatGPT-assisted and lexicon-only approaches with 200 random lyrics. Column "IntensityWithSign1": total sentiment score via the ChatGPT-assisted approach. Column "IntensityWithSign2": total sentiment score via the lexicon-only approach. Column "Manual_Check": manually check if a lyric is positive ("P") or negative ("N"). 

3. Python code for data processing:
(a) "GPT.py": communicates with OpenAI's ChatGPT API to feed lyrics and receives three output emotion words as return.
(b) "split3emos2emoWords.py": splits the string of three ChatGPT-output emotion words into analyzable, emotion-centered units/groups of words.
(c) "calc3emotionsChEmos_synonym_expanded.py": calculates the sentiment score for each lyric based on the three ChatGPT-output emotion words split by 3(b), used in conjunction with the lexicon-based method.
(d) "splitChLyrics.py": splits each lyric into analyzable, emotion-centered units/groups of words.
(e) "calcChLyricsEmos.py": calculates the sentiment score for each lyric using the lexicon-only method. The input is from 3(d).
(f) "KMeans Clustering of Lyrics based on Eight Emotions.ipynb": perform KMeans clustering.

4. Lexicons (all being csv files) used for calculating lyric sentiments (input to 3(c) and 3(e)):
(a) "negation words.csv": contains common negation words in Chinese.
(b) "degree words.csv": contains common degree adverbs in Chinese.
(c) "DLUT_reshaped_final.csv": the reshaped original CSVOL emotion lexicon used in this study. DLUT stands for Dalian University of Technology, the institution that developed the CSVOL lexicon.
(d) "Anti-DLUT_reshaped_final.csv": contains the same entries as in 4(c), but the emotion data are transposed to the antonyms with emotional intensities all set to 1. It is used if there are an odd number of negation words in a group of words.
(e) "Anti-Anti-DLUT_reshaped_final.csv": contains the same entries as in 4(c), but the emotional intensities are all set to 1. It is used if there are an even number of negation words in a group of words.
(f) "DLUT_reshaped_final_Synonym_expanded": expanded version of 4(c) with synonyms incorporated.
(g) "Anti-DLUT_reshaped_final_Synonym_Expanded": expanded version of 4(d) with synonyms incorporated.
(h) "Anti-Anti-DLUT_reshaped_final_Synonym_Expanded": expanded version of 4(e) with synonyms incorporated.

5. Logic flowchart for 3(e), as well as 3(c), in the lexicon-only approach: 
(a) "Logic flow chart for calculating emotion score for each emotion group of words.pdf"
